5 research outputs found

    Built to Last or Built Too Fast? Evaluating Prediction Models for Build Times

    Full text link
    Automated builds are integral to the Continuous Integration (CI) software development practice. In CI, developers are encouraged to integrate early and often. However, long build times can be an issue when integrations are frequent. This research focuses on finding a balance between integrating often and keeping developers productive. We propose and analyze models that can predict the build time of a job. Such models can help developers to better manage their time and tasks. Also, project managers can explore different factors to determine the best setup for a build job that will keep the build wait time to an acceptable level. Software organizations transitioning to CI practices can use the predictive models to anticipate build times before CI is implemented. The research community can modify our predictive models to further understand the factors and relationships affecting build times.Comment: 4 paged version published in the Proceedings of the IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) Pages 487-490. MSR 201

    On utilizing an enhanced object partitioning scheme to optimize self-organizing lists-on-lists

    Get PDF
    Author's accepted manuscript.This is a post-peer-review, pre-copyedit version of an article published in Evolving Systems. The final authenticated version is available online at: http://dx.doi.org/10.1007/s12530-020-09327-4.acceptedVersio

    Collaborative Cloud Computing Framework for Health Data with Open Source Technologies

    Full text link
    The proliferation of sensor technologies and advancements in data collection methods have enabled the accumulation of very large amounts of data. Increasingly, these datasets are considered for scientific research. However, the design of the system architecture to achieve high performance in terms of parallelization, query processing time, aggregation of heterogeneous data types (e.g., time series, images, structured data, among others), and difficulty in reproducing scientific research remain a major challenge. This is specifically true for health sciences research, where the systems must be i) easy to use with the flexibility to manipulate data at the most granular level, ii) agnostic of programming language kernel, iii) scalable, and iv) compliant with the HIPAA privacy law. In this paper, we review the existing literature for such big data systems for scientific research in health sciences and identify the gaps of the current system landscape. We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment. We also evaluate the system using a large clinical data set of 69M patients.Comment: This paper is accepted in ACM-BCB 202

    On utilizing an enhanced object partitioning scheme to optimize self-organizing lists-on-lists

    Get PDF
    With the advent of “Big Data” as a field, in and of itself, there are at least three fundamentally new questions that have emerged, namely the Artificially Intelligence (AI)-based algorithms required, the hardware to process the data, and the methods to store and access the data efficiently. This paper (The work of the second author was partially supported by NSERC, the Natural Sciences and Engineering Council of Canada. We are very grateful for the feedback from the anonymous Referees of the original submission. Their input significantly improved the quality of this final version.) presents some novel schemes for the last of the three areas. There have been thousands of papers written regarding the algorithms themselves, and the hardware vendors are scrambling for the market share. However, the question of how to store, manage and access data, which has been central to the field of Computer Science, is even more pertinent in these days when megabytes of data are being generated every second. This paper considers the problem of minimizing the cost of retrieval using the most fundamental data structure, i.e., a Singly-Linked List (SLL). We consider a SLL in which the elements are accessed by a Non-stationary Environment (NSE) exhibiting the so-called “Locality of Reference”. We propose a solution to the problem by designing an “Adaptive” Data Structure (ADS), which is created by utilizing a composite of hierarchical data “sub”-structures to constitute the overall data structure (In this paper, the primitive data structure is the SLL. However, these concepts can be extended to more complicated data structures.). In this paper, we design hierarchical Lists-on-Lists (LOLs) by assembling a SLL into a hierarchical scheme that results in SLLs on SLLs (SLLs-on-SLLs) comprising of an outer-list and sublist contexts. The goal is that elements that are more likely to be accessed together are grouped within the same sub-context, while the sublists themselves are moved “en masse” towards the head of the list-context to minimize the overall access cost. This move is carried out by employing the “de-facto” list re-organization schemes, i.e., the Move-To-Front (MTF) and Transposition (TR) rules. To achieve the clustering of elements within the sublists, we invoke the Object Migration Automaton (OMA) family of reinforcement schemes from the theory of Learning Automata (LA). They capture the probabilistic dependence of the elements in the data structure, receiving query accesses from the Environment. We show that SLLs-on-SLLs augmented with the Enhanced OMA (EOMA) minimizes the retrieval cost for elements in NSEs, and are superior to the stand-alone MTF and TR schemes, and also superior to the OMA-augmented SLLs-on-SLLs operating in such Environments
    corecore